Evaluating the Use of a Language Model to Crowdsource Gun Violence Reports
Social media is a valuable source for crowdsourcing evidence in human rights monitoring and investigations, but…
Keyword-based search leads to a high proportion of unrelated text
Small teams can’t process high volumes of data
Machine learning models show promise (Alhelbawy et al., 2020; Pilankar et al., 2022), but gaps remain:
Few real-world evaluations
Unknown impact on workflows
No previous work with Portuguese texts
We deployed an open-source language model to assist crowdsourcing gun violence events with social media.
Partnering with a Brazilian organization allowed us to systematically evaluate its application in a real-world setting in 2023.
Fogo Cruzado (“Crossfire”) monitors events of gun violence in four Brazilian cities.
A small team of analysts track social media posts 24/7.
They have been interacting with users who report gun violence on Twitter/X for years.
Keyword-based search with geographical filters on Tweetdeck.
RQ1 - Can Transformer-based language models accurately identify gun violence reports in Brazilian Portuguese social media texts?
RQ2 - What are the advantages and challenges of adopting a language model for real-time monitoring compared to manually reviewing social media texts?
RQ1: Model development Fine-tuned a BERT-based model on Portuguese tweets using past analyst interactions as training data.
RQ2: Real-world deployment We built a visual interface to help analysts navigate the model’s predictions and conducted an evaluation of its adoption using mixed methods, including surveys, interviews, and interaction metrics analyzed (diff-in-diff model).
A BERT-based model in Portuguese achieved good performance. Inference can be performed on CPUs.
Tweets were updated every fifteen minutes.
The prototype effectively filtered out less relevant social media content.
Interview with an analyst
“[Now] I do not have to go hunting for tweets.
Sometimes, I missed them [gun violence reports] because there were too many [unrelated] messages. During the BBB [Big Brother Brasil, an annual TV show extremely popular on Twitter], it was chaotic [. . . ]. It was literally a treasure hunt”
Our prototype removed the need for restrictive geolocation filters, allowing analysts to expand their search scope.
We estimated that analysts using the model engaged in nine additional daily interactions with users reporting events.
The interviews and surveys allowed us to identify three major shortcomings:
The delay between each update: promptly updating new tweets is critical.
Static keywords used for search: terms need to be dynamically set to monitore live conflicts.
Use of text-only features: profile images also help analysts to decide if they will interact with users.
🤝 AI and small language models can amplify crowdsourcing, not replace human judgment
📊 Real-world evaluation matters: lab performance ≠ practical impact
⚠️ Platform dependencies are fragile: closure of API access is critical
Ayman Alhelbawy, Mark Lattimer, Udo Kruschwitz, Chris Fox, and Massimo Poesio. An NLP-Powered Human Rights Monitoring Platform. Expert Systems with Applications, 153, 2020. ISSN 0957-4174. https://doi.org/10.1016/j.eswa.2020.113365.
Yash Pilankar, Rejwanul Haque, Mohammed Hasanuzzaman, Paul Stynes, and Pramod Pathak. Detecting Violation of Human Rights via Social Media. In Proceedings of the First Computing Social Responsibility Workshop within the 13th Language Resources and Evaluation Conference, pages 40–45. European Language Resources Association, 2022. https://aclanthology.org/2022.csrnlp-1.6.
Hoang Thang Ta, Abu Bakar Siddiqur Rahman, Lotfollah Najjar, and Alexander Gelbukh. GAN-BERT: Adversarial Learning for Detection of Aggressive and Violent Incidents from Social Media. In Proceedings of the Iberian Languages Evaluation Forum (IberLEF 2022), CEUR Workshop Proceedings, 2022. https://ceur-ws.org/Vol-3202/davincis-paper7.pdf.
Thank you!
📧 Email: adriano@belisario.website
🌐 Website: belisario.website
👨🏻🏫 Presentation: belisario.website/crossfire_paper/